INSTANCE-BASED LEARNING: Nearest Neighbour with Generalisation
نویسندگان
چکیده
Instance-based learning is a machine learning method that classifies new examples by comparing them to those already seen and in memory. There are two types of instance-based learning; nearest neighbour and case-based reasoning. Of these two methods, nearest neighbour fell into disfavour during the 1980s, but regained popularity recently due to its simplicity and ease of implementation. Nearest neighbour learning is not without problems. It is difficult to define a distance function that works well for both discrete and continuous attributes. Noise and irrelevant attributes also pose problems. Finally, the specificity bias adopted by instance-based learning, while often an advantage, can over-represent small rules at the expense of more general concepts, leading to a marked decrease in classification performance for some domains. Generalised exemplars offer a solution. Examples that share the same class are grouped together, and so represent large rules more fully. This reduces the role of the distance function to determining the class when no rule covers the new example, which reduces the number of classification errors that result from inaccuracies of the distance function, and increases the influence of large rules while still representing small ones. This thesis investigates non-nested generalised exemplars as a way of improving the performance of nearest neighbour. The method is tested using benchmark domains and the results compared with documented results for ungeneralised exemplars, nested generalised exemplars, rule induction methods and a composite rule induction and nearest neighbour learner. The benefits of generalisation are isolated and the performance improvement measured. The results show that non-nested generalisation of exemplars improves the classification performance of nearest neighbour systems and reduces classification time. who have provided help over the past year. Grateful thanks must go to my supervisor, Ian Witten, for giving me the freedom to pursue my studies independently while always being there when needed, and to my partner, Suky, for her unending tolerance and support. Thank you also to the other machine learning students at the University of Waikato, for your valuable assistance and friendship.
منابع مشابه
Adaptive Distance Metrics for Nearest Neighbour Classification Based on Genetic Programming
Nearest Neighbour (NN) classification is a widely-used, effective method for both binary and multi-class problems. It relies on the assumption that class conditional probabilities are locally constant. However, this assumption becomes invalid in high dimensions, and severe bias can be introduced, which degrades the performance of the method. The employment of a locally adaptive distance metric ...
متن کاملGeneralized K-Nearest Neighbour Algorithm- A Predicting Tool
k-nearest neighbour algorithm is a non-parametric machine learning algorithm generally used for classification. It is also known as instance based learning or lazy learning. K-NN algorithm can also be adapted for regression that is for estimating continuous variables. In this research paper the researcher endow with a generalized K-nearest algorithm used for predicting a continuous value. In or...
متن کاملNearest Neighbour Distance Matrix Classification
A distance based classification is one of the popular methods for classifying instances using a point-to-point distance based on the nearest neighbour or k-NEAREST NEIGHBOUR (k-NN). The representation of distance measure can be one of the various measures available (e.g. Euclidean distance, Manhattan distance, Mahalanobis distance or other specific distance measures). In this paper, we propose ...
متن کاملMultiple Instance Learning with Genetic Programming for Web Mining
The aim of this paper is to present a new tool of multiple instance learning which is designed using a grammar based genetic programming (GGP) algorithm. We study its application in Web Mining framework to identify web pages interesting for the users. This new tool called GGP-MI algorithm is evaluated and compared with other available algorithms which extend a well-known neighborhood based algo...
متن کاملData Reduction for Instance-Based Learning Using Entropy-Based Partitioning
Instance-based learning methods such as the nearest neighbor classifier have proven to perform well in pattern classification in several fields. Despite their high classification accuracy, they suffer from a high storage requirement, computational cost, and sensitivity to noise. In this paper, we present a data reduction method for instance-based learning, based on entropy-based partitioning an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995